Goto

Collaborating Authors

 theory paper



We are very grateful to the reviewers for their helpful feedback and suggestions, and are pleased to have received a

Neural Information Processing Systems

Our responses to the main concerns are given as follows. Section 4.4 for a related discussion and generalizations to non-unit norms. We would be happy to move some of the less central corollaries (e.g., Sections 4.2 and 4.5) to the We will also correct the typo in Line 202.


Reviews: On the Convergence Rate of Training Recurrent Neural Networks

Neural Information Processing Systems

This paper shows that GD/SGD can minimize the training loss of RNNs with linear convergence rate assuming the hidden layer width is sufficiently large (polynomial in data size and time horizon length). In order to prove this, the authors show that within a small region around the initialization, the norm square of the gradient can be lower bounded by the function value (Theorem 3). The authors further show that the loss function is somewhat smooth (Theorem 4), which guarantees that moving in the negative gradient direction can decrease the function value. This paper builds new techniques to analyze multi-layer ReLU networks. This paper shows that with appropriate initialization, ReLU activations avoid exponential exploding and exponential vanishing.


Reviews: Covariate-Powered Empirical Bayes Estimation

Neural Information Processing Systems

This theory paper provides a number of novel results, including theoretical analysis of minimax bounds and an empirical analysis, for combinations of relatively simple statistical estimators and machine learning models of covariate information. The paper shows that these combinations improve on both the simple estimator alone and the machine learning model alone. The main concern raised by the reviewers is that the paper provides limited empirical validation. I disagree with this assessment, as the paper should be seen as a machine learning theory paper. As the proposed framework includes a number of advanced machine learning models, including XGBoost it should be very relevant for the NeurIPS community.